Skip to content

gh-144370: Disallow usage of control characters in header names, values and statuses in Lib/wsgiref/handlers.py for security#144371

Open
benediktjohannes wants to merge 39 commits intopython:mainfrom
benediktjohannes:patch-4
Open

gh-144370: Disallow usage of control characters in header names, values and statuses in Lib/wsgiref/handlers.py for security#144371
benediktjohannes wants to merge 39 commits intopython:mainfrom
benediktjohannes:patch-4

Conversation

@benediktjohannes
Copy link
Contributor

@benediktjohannes benediktjohannes commented Jan 31, 2026

This PR is opened as public as the primary described CVE is already publicated and this only contains one certain point that has to be changed in code in Python and refers to issue #144370 (gh-144370).

In Lib/wsgiref/handlers.py at 260 (def _convert_string_type(self, value, title):) it is necessary to add the same fix as in #143916 requested as well (credits to @sethmlarson 👍). This references #143917 and #144118 and already includes the latest fix for the inclusion of HTAB, additions to this in form of tests will automatically be added by merging #144118.

There are some differences to wsgiref.headers.Headers that I made because I chose the default status of "name" to be "True" instead of false as "True" includes one more disallowed character for safety in future use cases if not defined otherwise, but I also added "name=True" even if not necessary in use cases just for safety for another case (if the default case was changed in order to produce similarity to wsgiref.headers.Headers while I'd recommend to change wsgiref.headers.Headers). I've also made an addition to the raised "ValueError" including "values" (without a further description of the very exact in- and exclusions like HTAB because this doesn't seem to be user-sided relevant to me) and I've used a slightly shorter internal form for the variables (which can be changed if wanted, but it's just style).

Important note: I was / am not quite sure about these changes as they were not tested, but I'm highly confident that this should be correct as it (mostly) bases on changes which were already made in other PRs (or at least in one other PR because this is the only merged one (but this also refers to a "non-merged" PR, so has to be concretized in that way)). And another important aspect is that I was not quite sure about the "status" (whether my classification is correct here as a "non-name value" when considering the handling of the characters, so please correct me if I'm mistaken (and I wasn't quite sure as well at the question whether potentially there should be added some different disallowed characters for this or some disallowed be removed, so please correct me if I'm mistaken)). Please also correct me if I'm mistaken in general anywhere in this PR (maybe this isn't even necessary for this case and I'm completely mistaken because this wasn't tested, but I'm pretty confident just from looking at the code).

@benediktjohannes
Copy link
Contributor Author

Important note: This PR needs backport to 3.10, 3.11, 3.12, 3.13 and 3.14 please, thanks! 👍 (I'll probably add these tomorrow if there are no concerns.)

@benediktjohannes
Copy link
Contributor Author

@gpshead or @StanFromIreland / @ZeroIntensity please review, thanks!

@picnixz
Copy link
Member

picnixz commented Jan 31, 2026

What is different from #144118?

@picnixz picnixz closed this Jan 31, 2026
@picnixz
Copy link
Member

picnixz commented Jan 31, 2026

I would suggest that you directly comment on Seth's PR instead.

@benediktjohannes
Copy link
Contributor Author

@picnixz I guess that this was a misunderstanding, this PR is about a completely different thing that is changes. This is not a duplicate of the other PR because the other one is about adding HTAB again to Lib/wsgiref/headers.py, but this one is about Lib/wsgiref/handlers.py, not headers.py. This is something completely different because the other one only adds one character to be allowed, but this one reguards disallowing several characters in handlers.py (and not headers.py) which wasn‘t mentioned at all in the other PR because this concerns completely different aspects like status and some changes in debug mode in handlers.py and not headers.py. Please reopen. Thanks!

@benediktjohannes
Copy link
Contributor Author

@picnixz I guess, I know why this seemed to be a duplicate as I described the „differences“ to the other PR and then you thought that this was about some changes that I suggested for the other PR, but this is not what this PR is about because this is about something completely different in another location, so please reopen. Thanks!

@benediktjohannes
Copy link
Contributor Author

benediktjohannes commented Feb 1, 2026

And I think that is would not be a very good idea if we combined these changes in another PR because the first one was already merged two weeks ago and the other one is only about adding allowed phrases again while this one is about the opposit (disallowing phrases) with the only thing in common that this one does not disallow one thing which will be added in the new PR again, but both other PRs concern another location which makes this not even a duplication in that way, so please reopen. Thanks!

cc @gpshead @sethmlarson @StanFromIreland @ZeroIntensity

@gpshead gpshead reopened this Feb 1, 2026
@benediktjohannes
Copy link
Contributor Author

Thank you very much!

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should write tests for theses changes.

"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

_name_disallowed = re.compile(r'[\x00-\x1F\x7F]')
_value_disallowed = re.compile(r'[\x00-\x08\x0A-\x1F\x7F]')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to wait until PR gh-144118 is merged, and then get these regexs from wsgiref.headers (line 4).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea! One other idea that came to my mind when reading this is that we could theoretically use the same converter for wsgiref.headers and wsgiref.handlers (if we already use the same regexes here), but maybe something will be changed in one and then it's unclear that it's used there too, so probably a good idea to only use these twice because they should (probably) always be updated in the same way and it should be more secure to only have to update them once so that they aren't forgotten in the other place (like they were before I found them in this place), so that's a nice idea for future security, very good engineering!

Unforetunately, I can't add this before the other PR is merged because the tests won't work, so for now I have to let this be the same or theoretically I could also let this the same here and add a suggestion to the other PR so that it includes the regexes from this one because it seems to be urgent to me to merge this as it's a security update while the other PR only adds one character back (which is not that urgent) and then I could just add this as a suggestion to the other one.

What's your opinion on this @vstinner ? Thanks!

Co-authored-by: Victor Stinner <vstinner@python.org>
@benediktjohannes
Copy link
Contributor Author

Hi @vstinner thank you very much for your review! I really like your suggestions and I'll make some commits in order to get them in the PR soon, thank you very much!

benediktjohannes and others added 2 commits February 4, 2026 16:47
Co-authored-by: Victor Stinner <vstinner@python.org>
@benediktjohannes
Copy link
Contributor Author

(Added the "re" again because it's included in vstinner's regex and also in sethmlarson's PR so if this will be used in both cases then it's consistent in that way and it would take too much time to change this there as well and the suggestions by seth and vstinner were both with the "re", so I've added it)

@benediktjohannes
Copy link
Contributor Author

I'm not quite sure about the status whether it should be handled as a value or a name, so please correct me if I'm mistaken inthis case (or even none of both? but I don't think that)

@benediktjohannes
Copy link
Contributor Author

You should write tests for theses changes.

Tests are basically already included by seth's PR, but I can of course add similar ones to handlers as well.

@vstinner
Copy link
Member

vstinner commented Feb 4, 2026

Tests are basically already included by seth's PR, but I can of course add similar ones to handlers as well.

Please add new tests on handlers.

@benediktjohannes
Copy link
Contributor Author

@vstinner please re-review, thanks! (I'm still not quite sure about it because I haven't tested it (also the tests are not tested) or is this automatically done by the lint check?)

@benediktjohannes
Copy link
Contributor Author

(but even if this is done by the lint check automatically, I'm not quite sure about the handling of the status)

@benediktjohannes
Copy link
Contributor Author

and thanks for the review by the way 👍

@benediktjohannes
Copy link
Contributor Author

Ahh, this seems actually to be checked automatically by the Tests based on what I've seen! Nice! But I'm not quite sure about the correct classification of the status nevertheless, so please have a look there, thanks!

@benediktjohannes
Copy link
Contributor Author

Ah, wrong way around, names is more strict, sorry

@benediktjohannes benediktjohannes changed the title gh-144370: Disallow usage of control characters in status, headers and values for security gh-144370: Disallow usage of control characters in header names, values and statuses for security Feb 6, 2026
@benediktjohannes benediktjohannes changed the title gh-144370: Disallow usage of control characters in header names, values and statuses for security gh-144370: Disallow usage of control characters in header names, values and statuses in Lib/wsgiref/handlers.py for security Feb 6, 2026
@benediktjohannes
Copy link
Contributor Author

I've now imported the regexes out of headers.py and hope that everything's fine now. 👍 There are two things I want to mention: It now seems to be the case that in debug mode the name and value are checked twice because they are already checked indirectly via headers.py, but I think that this is fine because an extra layer of security if this is changed in any way in future should be positive and I don't think (while it was not tested) that this costs that much of performance (and it should only be relevant in debug mode I guess). And one other thing: I'm not quite sure based up on the security reports whether statuses should be included when disallowing the c0 control characters, but I do think so because they're normally never a good thing and always likely to be used for injections, but I'm still not quite sure about another thing because like we've already said statuses also may include a message (which makes them even more risky for injections I guess, so this fix should be even more relevant I think) which could theoretically contain HTAB and I'm not quite sure about the handling in this case whether it should be handled as a name or a value. I've now set it to the more strict one because this should be safer to go, but it could theoretically cause problems if HTAB should be allowed, but I think that it's probably better to handle it this way and if there is really a important use case for it and it should really be allowed (and I'm mistaken with my idea that it's probably unnecessary and probably even risky to include it) then it would probably be reported via an issue like in seth's case, so I think that this is the best solution (and otherwise we won't even ever know whether it would have been better to disallow it and I think overall based up on the explainations before that it's better to disallow it).

@benediktjohannes benediktjohannes marked this pull request as ready for review February 12, 2026 21:19
@benediktjohannes
Copy link
Contributor Author

Now should be ready for review, please also write your opinion (if different to mine) to the things mentioned above, thanks! And thanks for helping out and adding some commits @vstinner and @sethmlarson! 👍

benediktjohannes and others added 3 commits February 13, 2026 13:54
Co-authored-by: Victor Stinner <vstinner@python.org>
…fp9m8t.rst

Co-authored-by: Victor Stinner <vstinner@python.org>
@benediktjohannes
Copy link
Contributor Author

Unforetunately since the suggested change the lint seems to fail, I'll investigate on this

@benediktjohannes
Copy link
Contributor Author

Ah, seems like it's because of the ` where on one side is one and on the other are two

@benediktjohannes
Copy link
Contributor Author

It should only have one ` I guess (the other PRs also only include one always, so I think that this is correct)

@benediktjohannes
Copy link
Contributor Author

I've changed the news a little bit to "to prevent injections." because it's not only header, but also status injections and thank you very much for your review @vstinner and thanks for the suggestions, both are good! 👍

@benediktjohannes
Copy link
Contributor Author

And please also have a look at the comment above concerning the handling of the status and the status message because of HTAB, thanks! @vstinner

@benediktjohannes
Copy link
Contributor Author

Edit: I've added @sethmlarson to ACKS

@vstinner
Copy link
Member

I've changed the news a little bit to "to prevent injections." because it's not only header, but also status injections and thank you very much for your review @vstinner and thanks for the suggestions, both are good! 👍

Inserting newlines in a status is mostly useful to inject a HTTP header, no?

with self.subTest(c0):
base = BaseHandler()
headers = [('x','y')]
self.assertRaises(ValueError, base.start_response, f"{c0}", headers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests on header keys and header names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore my comment if you implement my second comment on only checking status in Lib/wsgiref/handlers.py :-)

self.status = status
self.headers = self.headers_class(headers)
status = self._convert_string_type(status, "Status")
status = self._convert_string_type(status, "Status", name=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to only check the status using a regex here to leave _convert_string_type() unchanged?

with self.subTest(c0):
base = BaseHandler()
headers = [('x','y')]
self.assertRaises(ValueError, base.start_response, f"{c0}", headers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore my comment if you implement my second comment on only checking status in Lib/wsgiref/handlers.py :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants